Method and System to Add Video Capability to any Voice over Internet Protocol (Vo/IP) Session Initiation Protocol (SIP) Phone

ABSTRACT

Systems and methods of configuring an add-on device to augment the capability of existing endpoint infrastructure are disclosed. In one embodiment, a video add-on device is configured to receive, augment or downgrade, and forward messages for an existing SIP audio-only phone. The video add-on device in this embodiment can receive messages from the existing SIP audio-only phone and augment the messages with information regarding the additional video capabilities being provided. The messages can then be forwarded to an infrastructure SIP Proxy/Registrar for further routing. From the perspective of the infrastructure SIP Proxy/Registrar and other network attached devices the outbound messages from the video add-on device appear as if they originated from the video add-on device, other devices will not be directly aware of the existing SIP audio phone providing its designed function. Utilizing devices similar to the disclosed video add-on device may allow incremental corporate network endpoint upgrades.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.12/784,563 filed May 21, 2010 entitled “Method and System to Add VideoCapability to Any Voice Over Internet Protocol (Vo/IP) SessionInitiation Protocol (SIP) Phone”, which is incorporated herein byreference in its entirety.

FIELD OF DISCLOSURE

This disclosure relates generally to the field of audio and videoconferencing. More particularly, but not by way of limitation, to amethod of augmenting a Session Initiation Protocol (SIP) message and itscorresponding Session Description Protocol (SDP) definition to allowadditional capabilities while maintaining a viable interface to anotherdevice (e.g., a legacy device).

BACKGROUND

Session Initiation Protocol (SIP) is an Internet Engineering Task Force(IETF) defined signaling protocol, used for controlling multimediacommunication sessions such as voice and video calls over InternetProtocol (IP). The protocol can be used for creating, modifying andterminating two-party (unicast) or multiparty (multicast) sessionsconsisting of one or several media streams. The modifications caninvolve changing addresses or ports, inviting more participants, addingor deleting media streams, etc. Other application examples include videoconferencing, streaming multimedia distribution, instant messaging,presence information and online gaming.

The SIP protocol is an IP-based Application Layer protocol. SIP isdesigned to be independent of the underlying transport layer. SIP canrun on Transmission Control Protocol (TCP), User Datagram Protocol(UDP), or Stream Control Transmission Control Protocol (SCTP). SIP is atext-based protocol (e.g., ASCII text encoded). SIP incorporates manyelements of the Hypertext Transfer Protocol (HTTP) and the Simple MailTransfer Protocol (SMTP).

SIP employs design elements similar to the HTTP request/responsetransaction model. Each transaction consists of a client request thatinvokes a particular method or function on the server and at least oneresponse. SIP reuses most of the header fields, encoding rules andstatus codes of HTTP and providing a readable text-based format.

SIP works in concert with several other protocols and is only involvedin the signaling portion of a communication session. SIP clientstypically use TCP or UDP on port numbers 5060 and/or 5061 to connect toSIP servers and other SIP endpoints. SIP is primarily used for settingup and tearing down voice or video calls. The voice and video streamcommunications in SIP applications are carried over another applicationprotocol such as Real-time Transport Protocol (RTP). Parameters (e.g.,port numbers, protocols, codecs) for corresponding media streams aredefined and negotiated using the Session Description Protocol (SDP)which is transported in the SIP packet body. SIP and SDP are defined inthe IETF Request For Comment (RFC) documents 3261 and 4566 each of whichare incorporated by reference in their entirety herein.

A SIP user agent (UA) is a logical network end-point used to create orreceive SIP messages and thereby manage a SIP session. A SIP UA canperform the role of a User Agent Client (UAC), which sends SIP requests,and a User Agent Server (UAS), which receives the requests and returns aSIP response. These roles of UAC and UAS typically only last for theduration of a SIP transaction. A SIP phone is a SIP UA that provides thetraditional call functions of a telephone, such as dial, answer, reject,hold/unhold, and call transfer. SIP phones may be implemented bydedicated hardware controlled by the phone application directly orthrough a combination of hardware, software and firmware. SIP phones canbe any phone with IP connectivity including traditional desktop phones,cell phones, smart phones or Personal Digital Assistants (PDAs), etc.

Each resource of a SIP network, such as a User Agent or a voicemail box,is identified by a Uniform Resource Identifier (URI), based on thegeneral standard syntax also used in Web services and e-mail. A typicalSIP URI is of the form: sip:username:password@host:port. The URI schemeused for SIP is sip:. If secure transmission is required a message maybe encrypted and a scheme of sips: is used and corresponding messagesare transported over Transport Layer Security (TLS).

SIP also defines server network elements as outlined in RFC 3261. A“proxy server” is an intermediary entity that acts as both a server anda client for the purpose of making requests on behalf of other clients.A proxy server primarily plays the role of routing, which means its jobis to ensure that a request is sent to another entity “closer” to thetargeted user. Proxies are also useful for enforcing policy (e.g.,making sure a user is authorized to make a call). A proxy interprets,and if necessary, rewrites specific parts of a request message beforeforwarding the message. A registrar is a server that accepts REGISTERrequests and places the information it receives in those requests intothe location service for the domain it handles. The RFC for SIPspecifies that it is an important concept that the distinction betweentypes of SIP servers is logical, not physical. In practice, differentlogical capabilities of SIP can be performed by one server or splitacross a plurality of physical devices as required by design choices.

As mentioned above, SDP is a format for describing streaming mediainitialization parameters in an ASCII string. SDP is intended fordescribing multimedia communication sessions for the purposes of sessionannouncement, session invitation, and parameter negotiation. SDP doesnot deliver media itself but is used for negotiation between end pointsof media type, format, and all associated properties. The set ofproperties and parameters are often referred to as a session profile.

A Session Description is a well defined format for conveying sufficientinformation to discover and participate in a multimedia session. Asession is described by a series of attribute/value pairs, one per line.The attribute names are single characters, followed by “=”, and a value.Optional values are specified with “=*”. Values are either in an ASCIIstring, or a sequence of specific types separated by spaces. Attributenames are only unique within the associated syntactic construct, i.e.,within the Session, Time, or Media only.

FIG. 1A shows a typical network topology (diagram 100) for a SIP basedphone environment as may be found in the prior art. Network diagram 100shows a pair of SIP phones (105, 106) connected to IP network 110 andconfigured for Voice over IP (VoIP) phone calls. SIP Proxy and SIPRegistrar functions are provided by SIP Proxy/Registrar server 120. Inthis example both of these logical functions have been included with asingle server 120, however, these functions may also be implemented ontwo distinct hardware servers.

FIG. 1B shows a timeline 150 of a typical prior art process of utilizingSIP to signal from a first phone to a second phone to establish a callutilizing example pieces of network 100. Initially (time 155), eachphone will register with a SIP Registrar/Proxy server via a REGISTERmessage. The information for this registration can be preconfigured intothe device or each device can be provisioned utilizing a mechanismsimilar to Dynamic Host Configuration Protocol (DHCP). After, the phoneshave established their connection to the Proxy/Registrar infrastructurethey are each capable of making/receiving phone calls. In timeline 150,phone 1 (105) calls phone 2 (106) by sending (time 160) an INVITEmessage to the Proxy/Registrar server 120 with the INVITE messageaddressed to phone 2 (106). The Proxy/Registrar server will interrogatethe message and locate/forward the INVITE message toward a networkdestination “closer” to phone 2 (106). Upon receipt at phone 2 (106),phone 2 (106) will respond with an OK message (time 165) if it is readyand able to accept the phone call. The INVITE message and the OKresponse include information about the audio capabilities of each ofdevices 105 and 106 such that a negotiation for a particular type oftransmission of data may take place. Phone 1 (105) responds with an ACKmessage (time 170) to phone 2 (106) indicating how to establish the datatransfer communication session for a VoIP phone call as shown at time175.

Prior art networks such as 100 primarily consist of SIP endpointsconfigured for a particular function and having hardware componentscompatible with that particular function. Upgrading of endpoints tosupport enhanced functionality typically requires replacing a hardwarecomponent that is acting as an endpoint. Alternatively, there have beenprior art devices which split the audio and video processing betweendevices, however those devices involve two devices with requiredembedded information and having a private means of communication andcoordination between each of the two devices. Accordingly, it isdesirable to provide a method and device capable of augmentingcapabilities at an existing endpoint without being required to replace alegacy (or less capable) endpoint device and without requiring a privatemeans of communication and coordination between devices. For example, aSIP audio-only phone (e.g., 105, 106) may be augmented to a video phonewhile still providing its original audio-only capability by using themethods and systems disclosed herein.

SUMMARY

In one embodiment, an add-on device is added to an existing corporatenetwork to upgrade an existing SIP audio-only phone to an endpointsupporting full audio/video conferencing capability. The add-on devicecan function as a transparent intermediary to the existing SIPaudio-only phone. As messages are received from the SIP phone at thevideo add-on device they can be augmented to include video attributes.As messages are received at the video add-on device from other devicesin the network they can be stripped of the video attributes and alteredto only carry audio data as expected by the SIP phone. Utilizing thismethod and system capabilities can be added to an existing corporatenetwork without having to replace all of the equipment at the upgradedendpoint location. Further, the existing equipment can continue tofunction as it was originally designed and not require special dataconnections or updates.

In another embodiment, the add-on device can augment outbound messagesfrom a H.323 device and downgrade inbound messages to the H.323 deviceto support a capability like video, camera control and room control inconjunction with the standard capability of audio.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A shows, in block diagram form, a prior art network topology forVoIP phones connected utilizing SIP.

FIG. 1B shows a timeline of steps required to establish (via SIP) a VoIPphone call.

FIG. 2 shows, in block diagram form, a network topology for augmenting aSIP based audio-only phone device to support full video conferencingbased on one disclosed embodiment.

FIG. 3 shows a timeline of steps required to establish (via SIP) a videoconference utilizing a video device and an audio-only phone according toone disclosed embodiment.

FIGS. 4A-B show a SIP INVITE message from an audio-only phone to a videoadd-on device according to one disclosed embodiment.

FIG. 5 shows, in block diagram form, a programmable control devicecomprising a processing unit as may be found in selected corporate IPcapable devices.

DETAILED DESCRIPTION

The following disclosure describes a method and system to augment thecapabilities of a SIP endpoint without replacing the existing endpoint.In one embodiment the method comprises adding an additional devicebetween the existing endpoint and a SIP Proxy/Registrar server toprovide the augmented capability. The add-on device can appear to theexisting endpoint as a SIP Proxy/Registrar server and effectively act asa bridge, gateway or router to a SIP Proxy/Registrar already availablein the network infrastructure. The add-on device can either add (foroutbound traffic) or remove (for inbound traffic) portions of themessages that are not supported by the original SIP endpoint device.

The following disclosure is described in the context of adding videocapability to an existing voice only SIP phone. Other implementationsand augmentation capabilities will be apparent to those skilled in theart, given the benefit of this disclosure. For example, the same affectscan be achieved with H.323 where the add-on device inserts itself in thecall path via gatekeeper signaling. In addition to video, capabilitiessuch as far end camera control, far end room control (lighting, blinds,etc), serial pass through and application sharing could be added to theendpoint. Another example of augmentation could be the insertion of anelectronic whiteboard application/device instead of or in addition to avideo device. Note the add-on device is not assuming the role of aformal SIP Proxy/Registrar. Instead the end device is configured tothink the add-on device is a SIP Proxy/Registrar and the add-on devicecan simply pass and augment the messages on their way to the realinfrastructure SIP Proxy/Registrar. In this manner the add-on device canbe placed transparently into the message flow.

Referring now to FIGS. 2 and 3, network diagram 200 shows an augmentedversion of network diagram 100 (described above) in which video add-ondevices 205 and 206 have been connected to extend the technicalcapabilities of SIP phones 105 and 106 respectively. Timing diagram 300outlines example steps for SIP signaling from a phone 105 to a phone 106including a video capability provided by video add-on devices 205 and206. Initially at time segment 310 SIP phone 105 transmits a registerrequest which is sent to video add-on device 205. Video add-on device205 forwards the request on to the SIP Proxy/Registrar In a similarmanner SIP phone 106 transmits it request to video add-on device 206which forwards the request on to the SIP Proxy/Registrar 120. Next, attime segment 320, OK messages are returned from each SIP Proxy/Registerlogical component to the device which sent the REGISTER message. Nowdevices 105, 106, 205 and 206 are ready to place or receive calls.

As mentioned above SIP VoIP calls begin with an INVITE message as shownat time segment 330. Note that the invite message from SIP phone 105routes to video device 205 and includes the audio capabilities of SIPphone 105. Video device 205 augments the INVITE message to include videocapabilities of SIP video add-on device 205 and forwards the message toSIP Proxy/Registrar 120. From the viewpoint of SIP Proxy/Registrar 120video add-on device appears to be a device with both video and audiocapabilities. Placement of this example call continues with SIPProxy/Registrar 120 sending the augmented INVITE message to SIP videoadd-on device 206. SIP video add-on device 206 receives the INVITEmessage, removes attributes associated with video capabilities from theINVITE message and forwards the INVITE message (still containing audiocapability information) to SIP phone 106.

SIP phone 106 can respond with an OK message including its audiocapabilities to facilitate negotiation of parameters for a connection.The OK message from SIP phone 106 is routed to video add-on device 206which augments the message with supported video capabilities andforwards the augmented message to SIP Proxy/Registrar 120. SIPProxy/Registrar 120 recognizes that this OK message is for video add-ondevice 205 and forwards the message toward video add-on device 205. Uponreceipt, video add-on device 205 can remove and process the video onlyportions of the OK message and forward the remaining portions to SIPphone 105. At time segment 350, an ACK message is routed from SIP phone105 toward SIP phone 106 taking the required route of 205, 120 and 206.After the ACK message has been received by SIP phone 106, a video andaudio phone call can take place as shown at time segment 360. Note thatfor the duration of this call only audio data is sent and received bySIP phones 105 and 106 in contrast to the audio/video data sent betweenvideo add-on devices 205, 206 and SIP Proxy/Registrar 120.

Referring now to FIGS. 4A-B, an example message from a SIP phone such as105, 106 to a video add-on device such as 205, 206 and an exampleaugmentation are shown. Message 400 shows an INVITE message as it mightlook from a SIP phone to a video add-on device. Lines (410) are thelines of the message that must be changed to account for theaugmentation and response routing of the message. The content lengthattribute of the message is an example of “control information” that maybe included in a SIP message. In this example, the content length mustbe changed to account for extra information and the m=audio line must bemodified to the IP port being used by the video add-on device. Inaddition to changing parameters in the message attributes associatedwith the video data (e.g., far end camera controls) are added as lines460 to the outbound message. These lines can be added/removed as theoutbound/inbound messages are logically passed through the videoconference add-on device 205, 206.

Although the above embodiments primarily deal with a SIP based phone,the disclosed method and system could also be implemented for a H.323environment. In addition, the disclosed system and method could be usedwherever SDP is used. SDP is also widely used with streaming of mediawith the Session Announcement Protocol (SAP) and Real-Time StreamingProtocol (RTSP). How the device inserts itself into the message pathwould be different for each protocol but would be understood by those ofordinary skill in the art, given the benefit of this disclosure, withoutrequiring undue experimentation.

Referring now to FIG. 5, an exemplary conferencing device 500 is shown.Exemplary conferencing device 500 comprises a programmable controldevice 510 which may be optionally connected to input 560 (e.g.,keyboard, mouse, touch screen, etc.), display 570 or program storagedevice 580. Also, included with program device 510 is a networkinterface 540 for communication via a network with other conferencingand corporate infrastructure devices (not shown). Note network interface540 may be included within programmable control device 510 or beexternal to programmable control device 510. In either case,programmable control device 510 will be communicatively coupled tonetwork interface 540. Also note program storage unit 580 represents anyform of non-volatile storage including, but not limited to, all forms ofoptical and magnetic storage elements including solid-state storage.

Program control device 510 may be included in a conferencing device andbe programmed to perform methods in accordance with this disclosure(e.g., those illustrated in FIGS. 3-4). Program control device 510comprises a processor unit (PU) 520, input-output (I/O) interface 550and memory 530. Processing unit 520 may include any programmablecontroller device including, for example, the Intel Core®, Pentium® andCeleron® processor families from Intel and the Cortex and ARM processorfamilies from ARM. (INTEL CORE, PENTIUM and CELERON are registeredtrademarks of the Intel Corporation. CORTEX is a registered trademark ofthe ARM Limited Corporation. ARM is a registered trademark of the ARMLimited Company.) Memory 530 may include one or more memory modules andcomprise random access memory (RAM), read only memory (ROM),programmable read only memory (PROM), programmable read-write memory,and solid state memory. One of ordinary skill in the art will alsorecognize that PU 520 may also include some internal memory including,for example, cache memory.

Various changes in the materials, components, circuit elements, as wellas in the details of the illustrated operational methods are possiblewithout departing from the scope of the following claims. For instance,acts in accordance with FIGS. 2 and 4 may be performed by a programmablecontrol device executing instructions organized into one or more modules(comprised of computer program code or instructions). A programmablecontrol device may be a single computer processor (e.g., PU 520), aplurality of computer processors coupled by a communications link or oneor more special purpose processors (e.g., a digital signal processor,DSP). Such a programmable control device may be one element in a largerdata processing system such as a general purpose computer system.Storage media, as embodied in storage devices such as 580, as well asmemory internal to program control device 510, suitable for tangiblyembodying computer program instructions include, but are not limited to:magnetic disks (fixed, floppy, and removable) and tape; optical mediasuch as CD-ROMs and digital video disks (DVDs); and semiconductor memorydevices such as Electrically Programmable Read-Only Memory (EPROM),Electrically Erasable Programmable Read-Only Memory (EEPROM),Programmable Gate Arrays and flash devices. These are also sometimesreferred to as computer readable medium or program storage devices.

In the above detailed description, various features are occasionallygrouped together in a single embodiment for the purpose of streamliningthe disclosure. This method of disclosure is not to be interpreted asreflecting an intention that the claimed embodiments of the subjectmatter require more features than are expressly recited in each claim.

Various changes in the details of the illustrated operational methodsare possible without departing from the scope of the following claims.For instance, time line steps of FIGS. 2 and 4 may perform theidentified steps in an order different from that disclosed here.Alternatively, some embodiments may combine the activities describedherein as being separate steps. Similarly, one or more of the describedsteps may be omitted, depending upon the specific operationalenvironment the method is being implemented in. In addition, acts inaccordance with FIGS. 2 and 4 may be performed by a programmable controldevice executing instructions organized into one or more programmodules. A programmable control device may be a single computerprocessor, a special purpose processor (e.g., a digital signalprocessor, “DSP”), a plurality of processors coupled by a communicationslink or a custom designed state machine. Custom designed state machinesmay be embodied in a hardware device such as an integrated circuitincluding, but not limited to, application specific integrated circuits(“ASICs”) or field programmable gate array (“FPGAs”).

It is to be understood that the above description is intended to beillustrative, and not restrictive. For example, the above-describedembodiments may be used in combination with each other. Many otherembodiments will be apparent to those of skill in the art upon reviewingthe above description. The scope of the invention should, therefore, bedetermined with reference to the appended claims, along with the fullscope of equivalents to which such claims are entitled. In the appendedclaims, the terms “including” and “in which” are used as theplain-English equivalents of the respective terms “comprising” and“wherein.”

What is claimed is:
 1. A method of supporting backward compatibility ofnetworked endpoint devices comprising: receiving, at an add-on device, amessage from a first endpoint device directed to a second endpointdevice wherein the message indicates support of a first set of one ormore capabilities and support of a second set of one or morecapabilities; modifying the message at the add-on device by removinginformation about the second set of capabilities, the second set ofcapabilities including an additional capability supported by the add-ondevice, the additional capability not supported by the second endpointdevice; altering attributes of the message to make the message appear toother devices that the message originated at the add-on device; alteringcontrol attributes of the message to be consistent with the removing ofinformation; and transmitting the modified message from the add-ondevice to the second endpoint device.
 2. The method of claim 1 whereinthe second endpoint device comprises an audio-only device.
 3. The methodof claim 2 where the audio-only device supports Session InitiationProtocol (SIP).
 4. The method of claim 1 wherein the second endpointdevice supports H.323 protocol.
 5. The method of claim 1 wherein theadditional capability includes support for data streams comprising videodata.
 6. The method of claim 4 wherein the second endpoint devicecomprises an audio-only phone.
 7. The method of claim 1 wherein theadditional capability includes support for data streams comprisingelectronic whiteboard capabilities.
 8. The method of claim 1 wherein thefirst endpoint device is configured for identifying the add-on device asan outbound recipient of a data stream from the endpoint device.
 9. Amethod comprising: receiving, at an add-on device, a message directed toa SIP endpoint wherein the message supports a first set of one or morecapabilities and a second set of one or more capabilities; modifying themessage at the add-on device by removing information about the secondset of capabilities, the second set of capabilities including anadditional capability supported by the add-on device, the additionalcapability not supported by the SIP endpoint device; altering attributesof the message to make the message appear to other devices that themessage originated at the add-on device; altering control attributes ofthe message in accordance with the removing of information; andtransmitting the modified message from the add-on device to the SIPendpoint device.
 10. The method of claim 9 wherein the SIP endpointdevice is an audio-only phone.
 11. The method of claim 9 wherein theadditional capability includes support for data streams comprising videodata.
 12. The method of claim 11 wherein the SIP endpoint device is anaudio-only phone.
 13. The method of claim 9 wherein the additionalcapability includes support for data streams comprising electronicwhiteboard capabilities.
 14. An add-on device comprising: a networkinterface communicatively coupled to a network; and a programmablecontrol device communicatively coupled to the network interface andprogrammed to: receive a message from an endpoint device wherein themessage supports a first set of one or more capabilities, supports asecond set of one or more capabilities, and identifies a specificdestination; modify the message to remove information about a second setof capabilities, the second set of capabilities including an additionalcapability supported by the add-on device, the additional capability notsupported by a second endpoint device at the specific destination; alterattributes of the message to make the message appear to other devicesthat the message originated at the add-on device; alter controlattributes of the message to be consistent with the removal ofinformation; and transmit the modified message from the add-on device tothe second endpoint device.
 15. The add-on device of claim 14 whereinthe add-on device supports the Session Initiation Protocol (SIP). 16.The add-on device of claim 14 wherein the add-on device supports theH.323 protocol.
 17. The add-on device of claim 14 wherein the second setof capabilities comprises video data capabilities.
 18. The add-on deviceof claim 14 wherein the second set of capabilities comprises electronicwhiteboard capabilities.
 19. The add-on device of claim 14 furthercomprising: a video device capable of augmenting the capabilities of oneor more endpoint devices coupled to the network.
 20. The add-on deviceof claim 19 wherein the video device comprises a video input device.